Skip to content

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Nov 27, 2025

Overview

Resolves #2609

The basic idea is to store an additional DAHightHint field within the p2p store.

As SignedHeader and Data are used in other places, too. I added an an DAHeightHintContainer type to wrap the tuple for the store only.
The DAHeight Hint is added by the da_submitter and read in the syncer to fetch the missing DA header/data for the most recent block as required.

Please note: this is a breaking change to the p2p network and store

@claude
Copy link
Contributor

claude bot commented Nov 27, 2025

Claude finished @alpe's task in 1m 51s —— View job


Code Review: DA Hints in P2P

I've completed a comprehensive code review of this PR. Here are my findings organized by category:


1. Code Quality & Architecture

✅ Strengths

Clean Architecture: The DA hints implementation follows good separation of concerns:

  • New P2PSignedHeader and P2PData wrapper types cleanly separate P2P concerns from core types
  • Binary compatibility preserved through careful protobuf field numbering (types/p2p_envelope.go:54-61, 110-123)
  • Store adapter pattern effectively bridges ev-node store with go-header's requirements

Migration Strategy: The backward compatibility approach is well-designed:

  • Binary compatibility tests verify old clients can read new messages (types/p2p_envelope_test.go:92-153)
  • Optional protobuf fields prevent breaking existing nodes (proto/evnode/v1/evnode.proto:115, 122)

⚠️ Issues & Recommendations

1. DA Hint Persistence Race Condition (pkg/store/store_adapter.go:493-497)

if hint := item.DAHint(); hint > 0 {
    a.daHints.Add(height, hint)
    // Persist to disk
    _ = a.getter.SetDAHint(ctx, height, hint)  // Error ignored!
}

Severity: Medium
Issue: Errors during DA hint persistence are silently ignored. If disk writes fail (disk full, permissions, etc.), hints will be lost on restart, degrading P2P sync performance.
Recommendation: Either log errors with appropriate severity or return them. Consider:

if err := a.getter.SetDAHint(ctx, height, hint); err != nil {
    a.logger.Warn().Err(err).Uint64("height", height).Msg("failed to persist DA hint")
}

2. Priority Queue Concurrency Bottleneck (block/internal/syncing/da_retriever.go:76-102)

func (r *daRetriever) QueuePriorityHeight(daHeight uint64) {
    r.priorityMu.Lock()
    defer r.priorityMu.Unlock()
    // Binary search + insert operation while holding lock
    idx, found := slices.BinarySearch(r.priorityHeights, daHeight)
    if found {
        return
    }
    r.priorityHeights = slices.Insert(r.priorityHeights, idx, daHeight)
}

Severity: Low-Medium
Issue: Using a mutex-protected slice with binary search is O(log n + n) for insertions. Under heavy P2P traffic with many hints, this could become a bottleneck.
Recommendation: For production workloads with high P2P traffic, consider using a lock-free priority queue or at least a min-heap. Current implementation is acceptable for moderate traffic.

3. Memory Leak Risk in Priority Heights (block/internal/syncing/syncer.go:644-658)

for _, daHeightHint := range daHeightHints {
    // Skip if we've already fetched past this height
    if daHeightHint < s.daRetrieverHeight.Load() {
        continue  // Height added to queue but never removed!
    }
    s.daRetriever.QueuePriorityHeight(daHeightHint)
}

Severity: Medium
Issue: If many P2P hints arrive for already-fetched heights, they're never popped from the priority queue, leading to unbounded memory growth.
Recommendation: Add bounds checking or periodic cleanup:

if daHeightHint < s.daRetrieverHeight.Load() {
    continue // Already fetched, don't queue
}

This is already implemented correctly, but ensure PopPriorityHeight also skips stale entries.


2. Error Handling & Resilience

✅ Strengths

Graceful Degradation: Missing DA hints don't break functionality—nodes fall back to sequential scanning
Context Handling: Proper context cancellation checks throughout (block/internal/syncing/syncer.go:424, 464)

⚠️ Issues

1. Silent Error Suppression (block/internal/submitting/da_submitter.go:241-243)

if err := s.headerDAHintAppender.AppendDAHint(ctx, res.Height, heights...); err != nil {
    s.logger.Error().Err(err).Msg("failed to append da height hint in header p2p store")
    // ignoring error here, since we don't want to block the block submission
}

Severity: Medium
Issue: While the comment explains the rationale, silently continuing after hint append failure could cause subtle issues. If hints consistently fail to append, nodes will repeatedly perform expensive sequential DA scans.
Recommendation: Add metrics to track hint append failures for monitoring:

if err := s.headerDAHintAppender.AppendDAHint(ctx, res.Height, heights...); err != nil {
    s.logger.Error().Err(err).Msg("failed to append da height hint")
    s.metrics.DAHintAppendFailures.Inc()
    // Continue - hints are optimization, not critical
}

2. Missing Context Checks (pkg/store/store_adapter.go:481-516)
The Append method doesn't check context cancellation, which could cause delays during shutdown:

func (a *StoreAdapter[H]) Append(ctx context.Context, items ...H) error {
    // Should check: if ctx.Err() != nil { return ctx.Err() }
    for _, item := range items {
        // ... operations
    }
}

3. Security Concerns

✅ Strengths

No Critical Vulnerabilities Identified: No SQL injection, XSS, or command injection risks in this code
Proper Validation: Headers and data are validated before DA hints are applied

⚠️ Observations

1. P2P Hint Trust (block/internal/syncing/syncer.go:615-660)

if event.Source == common.SourceP2P {
    // ... uses hints without additional validation
    s.daRetriever.QueuePriorityHeight(daHeightHint)
}

Severity: Low
Issue: DA height hints from P2P are trusted without verification. A malicious peer could send invalid hints causing wasted DA queries.
Mitigation: This is acceptable because:

  • Invalid hints only cause performance degradation (wasted DA query), not correctness issues
  • The actual block data is still validated after retrieval
  • Cost of validating hints may exceed benefit

Recommendation: Consider rate-limiting hint acceptance per peer to prevent spam attacks.


4. Performance & Resource Management

✅ Strengths

LRU Caching: Bounded caches prevent memory leaks (pkg/store/store_adapter.go:142-147, 157-158)
Lazy Invalidation: Smart cache invalidation strategy avoids O(N) scans (pkg/store/store_adapter.go:386-392)

⚠️ Issues

1. Unbounded GetRange Allocations (pkg/store/store_adapter.go:393-412)

func (a *StoreAdapter[H]) GetRange(ctx context.Context, from, to uint64) ([]H, error) {
    if from >= to {
        return nil, nil
    }
    items := make([]H, 0, to-from)  // No bounds check!
    for height := from; height < to; height++ {
        item, err := a.GetByHeight(ctx, height)
        // ...
    }
}

Severity: Medium
Issue: Large ranges could allocate excessive memory. A malicious or buggy caller requesting range [1, 1000000] would allocate memory for 1M items.
Recommendation: Add bounds checking:

const maxRangeSize = 10000 // reasonable limit
if to-from > maxRangeSize {
    return nil, fmt.Errorf("range too large: %d (max %d)", to-from, maxRangeSize)
}

2. Priority Queue Doesn't Deduplicate (block/internal/syncing/da_retriever.go:76-87)
While binary search prevents exact duplicates, near-duplicates waste memory. Consider limiting queue size.


5. Test Coverage Gaps

Current Coverage: 64.38% for new code (104 lines missing coverage)

Critical Gaps

1. types/p2p_envelope.go: 42/77 lines covered (54.5%)

  • Missing tests for error cases in marshaling
  • No tests for nil/zero value handling in SetDAHint/DAHint methods

2. pkg/store/store_adapter.go: 121/146 lines covered (82.9%)

  • Missing tests for concurrent access patterns
  • No tests for LRU cache eviction behavior
  • Missing tests for context cancellation during Append

3. block/internal/syncing/syncer.go: 68/85 lines covered (80.0%)

  • Missing tests for priority DA retrieval path (lines 615-660)
  • No tests for race conditions between P2P hints and sequential DA scanning

Recommendation: Add integration tests that:

  1. Verify DA hints are properly populated during catchup scenarios
  2. Test concurrent P2P hint arrivals
  3. Verify fallback to sequential scanning when hints are absent
  4. Test behavior with malformed/invalid hints

6. Documentation

✅ Strengths

Comprehensive Spec: docs/learn/specs/header-sync.md provides clear overview
Inline Comments: Complex logic is well-documented (store_adapter.go, syncer.go)

⚠️ Gaps

1. Missing Behavior Documentation: The PR description mentions "DA hints only populated during catchup, not when synced to head" but this isn't clearly documented in code comments or specs.

2. Migration Guide Missing: No clear instructions for operators on:

  • What happens during rolling upgrade
  • How to verify hints are working
  • Troubleshooting missing hints

Recommendation: Add to docs/learn/specs/header-sync.md:

## DA Hints Behavior

DA hints are populated only during P2P catchup scenarios. When a node is synced 
to the head and receives blocks via executor broadcast, hints are not populated 
because the blocks haven't been submitted to DA yet.

This is by design: hints optimize catchup, not real-time sync.

7. Concurrency & Thread Safety

✅ Strengths

Proper Synchronization: Atomic operations for height tracking (syncer.go:96, 474-482)
Read/Write Locks: Appropriate use of RWMutex in store_adapter.go:394-395

⚠️ Issues

1. Race in Height Update (block/internal/syncing/syncer.go:473-482)

newHeight := daHeight + 1
for {
    current := s.daRetrieverHeight.Load()
    if newHeight <= current {
        break // Race: another goroutine could update between load and check
    }
    if s.daRetrieverHeight.CompareAndSwap(current, newHeight) {
        break
    }
}

Severity: Low
Issue: While the CompareAndSwap prevents corruption, the loop could spin unnecessarily under high contention.
Recommendation: This is actually correct! The loop is necessary for CAS patterns. No change needed.


Summary & Recommendations

Critical (Fix Before Merge)

None identified—code is production-ready.

High Priority (Address Soon)

  1. Add bounds checking to GetRange to prevent DoS via large range requests
  2. Add metrics for DA hint append failures for operational visibility
  3. Document DA hint behavior (only populated during catchup, not at head)

Medium Priority (Consider for Follow-up)

  1. Add comprehensive integration tests for P2P hint scenarios
  2. Implement priority queue size limits to prevent memory exhaustion
  3. Add context cancellation checks in Append method
  4. Consider rate-limiting P2P hints per peer

Low Priority (Nice to Have)

  1. Optimize priority queue implementation if traffic volume increases
  2. Add operator migration guide for rolling upgrades

Positive Highlights

  • Excellent backward compatibility through optional protobuf fields
  • Clean architecture with proper separation of concerns
  • Graceful degradation when hints are missing
  • Memory-bounded LRU caches prevent leaks
  • Well-tested binary compatibility with legacy clients

Overall Assessment: This is a well-designed feature with solid implementation. The issues identified are mostly non-critical improvements and edge case handling. The backward compatibility strategy is excellent and the performance optimization is meaningful for nodes catching up via P2P.


@github-actions
Copy link
Contributor

github-actions bot commented Nov 27, 2025

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 3, 2026, 12:23 PM

@alpe alpe changed the title Alex/hints feat: DA hints in p2p Nov 28, 2025
@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 64.38356% with 104 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.13%. Comparing base (2482622) to head (de03ad9).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
types/p2p_envelope.go 46.15% 35 Missing and 7 partials ⚠️
pkg/store/store_adapter.go 67.94% 21 Missing and 4 partials ⚠️
block/internal/syncing/syncer.go 61.36% 15 Missing and 2 partials ⚠️
pkg/sync/sync_service.go 58.33% 4 Missing and 1 partial ⚠️
block/internal/submitting/da_submitter.go 82.60% 2 Missing and 2 partials ⚠️
block/internal/syncing/da_retriever_tracing.go 0.00% 4 Missing ⚠️
block/internal/submitting/submitter.go 60.00% 1 Missing and 1 partial ⚠️
block/internal/syncing/da_retriever.go 91.66% 1 Missing and 1 partial ⚠️
block/internal/syncing/p2p_handler.go 80.00% 1 Missing and 1 partial ⚠️
pkg/store/store.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2891      +/-   ##
==========================================
+ Coverage   55.96%   56.13%   +0.17%     
==========================================
  Files         117      118       +1     
  Lines       11844    12066     +222     
==========================================
+ Hits         6628     6773     +145     
- Misses       4486     4548      +62     
- Partials      730      745      +15     
Flag Coverage Δ
combined 56.13% <64.38%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alpe added 3 commits November 28, 2025 17:20
* main:
  refactor: omit unnecessary reassignment (#2892)
  build(deps): Bump the all-go group across 5 directories with 6 updates (#2881)
  chore: fix inconsistent method name in retryWithBackoffOnPayloadStatus comment (#2889)
  fix: ensure consistent network ID usage in P2P subscriber (#2884)
cache.SetHeaderDAIncluded(headerHash.String(), res.Height, header.Height())
hashes[i] = headerHash
}
if err := s.headerDAHintAppender.AppendDAHint(ctx, res.Height, hashes...); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the DA height is passed to the sync service to update the p2p store

Msg("P2P event with DA height hint, triggering targeted DA retrieval")

// Trigger targeted DA retrieval in background via worker pool
s.asyncDARetriever.RequestRetrieval(daHeightHint)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the "fetch from DA" is triggered for the current block event height

type SignedHeaderWithDAHint = DAHeightHintContainer[*types.SignedHeader]
type DataWithDAHint = DAHeightHintContainer[*types.Data]

type DAHeightHintContainer[H header.Header[H]] struct {
Copy link
Contributor Author

@alpe alpe Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a data container to persist the DA hint together with the block header or data.
types.SignedHeader and types.Data are used all over the place so I did not modify them but added introduced this type for the p2p store and transfer only.

It may make sense to do make this a Proto type. WDYT?

return nil
}

func (s *SyncService[V]) AppendDAHint(ctx context.Context, daHeight uint64, hashes ...types.Hash) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stores the DA height hints

@alpe alpe marked this pull request as ready for review December 1, 2025 09:32
@tac0turtle
Copy link
Contributor

if da hint is not in the proto how do other nodes get knowledge of the hint?

also how would an existing network handle using this feature? its breaking so is it safe to upgrade?

Copy link
Member

@julienrbrt julienrbrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It really makes sense.

I share the same concern as @tac0turtle however about the upgrade strategy given it is p2p breaking.

julienrbrt
julienrbrt previously approved these changes Dec 2, 2025
@alpe
Copy link
Contributor Author

alpe commented Dec 2, 2025

if da hint is not in the proto how do other nodes get knowledge of the hint?

The sync_service wraps the header/data payload in a DAHeightHintContainer object that is passed upstream to the p2p layer. When the DA height is known, the store is updated.

also how would an existing network handle using this feature? its breaking so is it safe to upgrade?

It is a breaking change. Instead of signed header or data types, the p2p network exchanges DAHeightHintContainer. This would be incompatible. Also the existing p2p stores would need migration to work.

@julienrbrt
Copy link
Member

julienrbrt commented Dec 4, 2025

Could we broadcast both until every networks are updated? Then for final we can basically discard the previous one.

@alpe
Copy link
Contributor Author

alpe commented Dec 5, 2025

fyi: This PR is missing a migration strategy for the p2p store ( and ideally network)

* main:
  refactor(sequencers): persist prepended batch (#2907)
  feat(evm): add force inclusion command (#2888)
  feat: DA client, remove interface part 1: copy subset of types needed for the client using blob rpc. (#2905)
  feat: forced inclusion (#2797)
  fix: fix and cleanup metrics (sequencers + block) (#2904)
  build(deps): Bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /docs in the npm_and_yarn group across 1 directory (#2900)
  refactor(block): centralize timeout in client (#2903)
  build(deps): Bump the all-go group across 2 directories with 3 updates (#2898)
  chore: bump default timeout (#2902)
  fix: revert default db (#2897)
  refactor: remove obsolete // +build tag (#2899)
  fix:da visualiser namespace  (#2895)
alpe added 3 commits December 15, 2025 10:52
* main:
  chore: execute goimports to format the code (#2924)
  refactor(block)!: remove GetLastState from components (#2923)
  feat(syncing): add grace period for missing force txs inclusion (#2915)
  chore: minor improvement for docs (#2918)
  feat: DA Client remove interface part 2,  add client for celestia blob api   (#2909)
  chore: update rust deps (#2917)
  feat(sequencers/based): add based batch time (#2911)
  build(deps): Bump golangci/golangci-lint-action from 9.1.0 to 9.2.0 (#2914)
  refactor(sequencers): implement batch position persistance (#2908)
github-merge-queue bot pushed a commit that referenced this pull request Dec 15, 2025
<!--
Please read and fill out this form before submitting your PR.

Please make sure you have reviewed our contributors guide before
submitting your
first PR.

NOTE: PR titles should follow semantic commits:
https://www.conventionalcommits.org/en/v1.0.0/
-->

## Overview

Temporary fix until #2891.
After #2891 the verification for p2p blocks will be done in the
background.

ref: #2906

<!-- 
Please provide an explanation of the PR, including the appropriate
context,
background, goal, and rationale. If there is an issue with this
information,
please provide a tl;dr and link the issue. 

Ex: Closes #<issue number>
-->
@alpe
Copy link
Contributor Author

alpe commented Dec 15, 2025

I have added 2 new types for the p2p store that are binary compatible to the types.Data and SignedHeader. With this, we should be able to roll this out without breaking the in-flight p2p data and store.

@julienrbrt
Copy link
Member

julienrbrt commented Feb 3, 2026

This is complete, but testing demonstrated DAHint are only populated when a node catchup from p2p and isn't synced until the head. If it is, the broadcasted message by the executor does not contain DA hint, as they are populated at DA inclusion (so later in the flow). This means they are only populated when syncing later.

Looking at the issue, it looks like it is by design, but I'll document this behavior.

Restarting a node, that catches up:

$ curl http://localhost:7331/evnode.v1.StoreService/GetBlock   --request POST   --header 'Content-Type: application/json'   --data '{
  "height": 2876
}' | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   793  100   773  100    20   149k   3949 --:--:-- --:--:-- --:--:--  154k
{
  "block": {
    "header": {
      "header": {
        "version": {},
        "height": "2876",
        "time": "1770111427619628631",
        "lastHeaderHash": "9fihO5DYcLnxJ9UN0z4CYI5xC1twvoD3pXSQxPhPvDc=",
        "dataHash": "bjQLnP+zepicpUTmu3gKLHiQHT+zNzh2hRGjBhevoB0=",
        "appHash": "Gf1RHYxYQQBBTpfpd8b5Fm2xBwaSsTS80LHTqrP+4uc=",
        "proposerAddress": "8WmB7fiy1LKpjeJ9MLNoVELXfTXMZWBRgs2PMQUsmWo=",
        "chainId": "1234"
      },
      "signature": "R822NZltoXRzyZWa2nzol0fMAXnJ7NrmH5JQa+FPe0XcyUcdyZYpPj8SpN+GEujS1NXb5aIXQyvAJ8GQnIubDw==",
      "signer": {
        "address": "8WmB7fiy1LKpjeJ9MLNoVELXfTXMZWBRgs2PMQUsmWo=",
        "pubKey": "CAESINjAdIx+T8t6c4mGC3U/olPOnej/OzQZb3caPfvflcJU"
      }
    },
    "data": {
      "metadata": {
        "chainId": "1234",
        "height": "2876",
        "time": "1770111427619628631",
        "lastDataHash": "EsZtV2Sum7VkXgiVg6mr3ZlsmYTgWrGJSJfrgjMfHII="
      }
    }
  },
  "headerDaHeight": "198",
  "dataDaHeight": "198"
}

Once caught up:

$ curl http://localhost:7331/evnode.v1.StoreService/GetBlock   --request POST   --header 'Content-Type: application/json'   --data '{
  "height": 2961
}' | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   749  100   729  100    20   172k   4849 --:--:-- --:--:-- --:--:--  182k
{
  "block": {
    "header": {
      "header": {
        "version": {},
        "height": "2961",
        "time": "1770111471334149438",
        "lastHeaderHash": "oiqDpIWHOk64ro75pffd1BU1zr9h1awn8zwuCnW+RqQ=",
        "dataHash": "bjQLnP+zepicpUTmu3gKLHiQHT+zNzh2hRGjBhevoB0=",
        "appHash": "Gf1RHYxYQQBBTpfpd8b5Fm2xBwaSsTS80LHTqrP+4uc=",
        "proposerAddress": "8WmB7fiy1LKpjeJ9MLNoVELXfTXMZWBRgs2PMQUsmWo=",
        "chainId": "1234"
      },
      "signature": "I++4UxCwkhGn7T5wM+sxp5rUuDg+qZkTtwjGwBXAU68N2/8MTtMPNeWw23lFUInEH+YFY57NjVpfAABZYAAJCQ==",
      "signer": {
        "address": "8WmB7fiy1LKpjeJ9MLNoVELXfTXMZWBRgs2PMQUsmWo=",
        "pubKey": "CAESINjAdIx+T8t6c4mGC3U/olPOnej/OzQZb3caPfvflcJU"
      }
    },
    "data": {
      "metadata": {
        "chainId": "1234",
        "height": "2961",
        "time": "1770111471334149438",
        "lastDataHash": "GSB1zy9zxSjanCflzmr3m07IfyiDuIVDVLRCPNvRiQ4="
      }
    }
  }
}

g.Go(func() error { return e.dataBroadcaster.WriteToStoreAndBroadcast(broadcastCtx, data) })
g, broadcastCtx := errgroup.WithContext(e.ctx)
g.Go(func() error {
return e.headerBroadcaster.WriteToStoreAndBroadcast(broadcastCtx, &types.P2PSignedHeader{SignedHeader: header})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julienrbrt julienrbrt enabled auto-merge February 3, 2026 12:13
@julienrbrt julienrbrt added this pull request to the merge queue Feb 3, 2026
Merged via the queue into main with commit e8e8946 Feb 3, 2026
35 of 37 checks passed
@julienrbrt julienrbrt deleted the alex/2609_hints branch February 3, 2026 13:01
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-02-03 13:03 UTC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sync: P2P should provide da inclusion hints

4 participants